Targets and Industries





Kerry Back

Transforming the target variable

  • We should usually deal with outliers for the target variable.
  • We’ve used ranks, but there are more systematic ways.
  • For example, we could use QuantileTransformer.
  • When we predict with the model, we get “untransformed” predictions (predicting original target)

Usual example

  • roeq and mom12m in 2021-12
  • RandomForestRegressor
  • transform1, poly, transform2
  • Use GridSearchCV to find max_depth

Changes

  • Instead of model=RandomForestRegressor, use model = TransformedTargetRegressor(…)
    • input1: regressor
    • input2: transformer
  • param_grid = {“transformedtargetregressor__regressor__max_depth”: [4, 6, 8]}
  • pipe = … (as before)
  • cv = … (as before)

from sklearn.compose import TransformedTargetRegressor

transform3 = QuantileTransformer(
    output_distribution="normal"
)

model = TransformedTargetRegressor(
    regressor=RandomForestRegressor(random_state=0),
    transformer=transform3
)

pipe = make_pipeline(
    transform1,
    poly,
    transform2,
    model
)

param_grid = {
    "transformedtargetregressor__regressor__max_depth": [4, 6, 8]
}

cv = GridSearchCV(
    pipe, 
    param_grid=param_grid
)

X = data[["roeq", "mom12m"]]
y = data["ret"]

cv.fit(X, y)

Exercise

Run GridSearchCV with

  • MLP Regressor,
  • hidden layer sizes = (16, 8, 4, 2) and (8, 4, 2) ,
  • and a transformed target variable.

Industries

  • Used OneHotEncoder to create dummy variables
  • Let’s also use deviations from industry means as predictors
  • E.g., maybe we want to buy stocks with high ROEs relative to their industry rather than just high ROE stocks in general

Calculating deviations from means

  • One industry at a time:
data["roeqx"] = data.groupby("industry").roeq.transform(
    lambda x: x - x.mean()
)
data["mom12mx"] = data.groupby("industry").mom12m.transform(
    lambda x: x - x.mean()
)
  • Or in a loop:
for char in ["roeq", "mom12m"]:
    data[char+"x"] = data.groupby("industry")[char].transform(
        lambda x: x - x.mean()
    )

Industry momentum

  • We could add industry means as additional features.
  • Probably most important for momentum. Industry momentum seems to be a significant phenomenom.
  • Could compute momentum of industry returns. But average momentum within the industry is easier and probably close.
data["mom12mi"] = data.groupby("industry").mom12m.mean()

Two versions: with and without industry dummies

  • In both versions:
transform3 = QuantileTransformer(
    output_distribution="normal"
)
model = TransformedTargetRegressor(
    regressor=RandomForestRegressor(random_state=0),
    transformer=transform3
)

Without dummies:

pipe = make_pipeline(
    transform1,
    poly,
    transform2,
    model
)
X = data[["roeq", "mom12m", "roeqx", "mom12mx", "mom12mi"]]

With dummies

numeric_transform = make_pipeline(
  transform1,
  poly,
  transform2
)

category_transform = OneHotEncoder()

feature_transform = make_column_transformer(
  (numeric_transform, ["roeq", "mom12m", "roeqx", "mom12mx", "mom12mi"]),
  (category_transform, ["industry"])
)

pipe = make_pipeline(
  feature_transform,
  model
)

X = data[["roeq", "mom12m", "roeqx", "mom12mx", "mom12mi", "industry"]]